Home > Computers & Technology > Databases & Big Data > Data Mining

Python Data Analytics by 2023

Author:2023 , Date: October 18, 2023 ,Views: 166

Python Data Analytics by 2023

Author:2023
Language: eng
Format: epub

Chapter 6 â pandas in depth: data Manipulation

In addition, after an operation of aggregation, the names of some columns may not be very meaningful.

In fact it is often useful to add a prefix to the column name that describes the type of business combination.

Adding a prefix, instead of completely replacing the name, is very useful for keeping track of the source data from which they derive aggregate values. This is important if you apply a process of transformation chain (a series or dataframe is generated from another), because it is important to keep some reference with the source data.

>>> means = frame.groupby('color').mean(numeric_only=True).add_prefix('mean_')>>> means mean_price1 mean_price2

color

green 2.025 2.375

red 2.380 2.435

white 5.560 4.750

Functions on Groups

Although many methods have not been implemented specifically for use with GroupBy, they actually work correctly with data structures as the series. You saw in the previous section how easy it is to get the series by a GroupBy object, by specifying the name of the column and then by applying the method to make the calculation. For example, you can use the calculation of quantiles with the quantiles() function.

>>> group = frame.groupby('color')

>>> group['price1'].quantile(0.6)

color

green 2.170

red 2.744

white 5.560

Name: price1, dtype: float64

You can also define your own aggregation functions. Define the function separately and then pass it as an argument to the mark() function. For example, you can calculate the range of the values of each group.

>>> def range(series):

... return series.max() - series.min()

...

>>> group['price1'].agg(range)

color

green 1.45

red 3.64

white 0.00

Name: price1, dtype: float64

You can also use more aggregate functions at the same time, with the mark() function passing an array containing the list of operations to be done, which will become the new columns.

>>> group['price1'].agg(['mean','std',range])

mean std range

color

green 2.025 1.025305 1.45

red 2.380 2.573869 3.64

white 5.560 NaN 0.00

178

Download

Python Data Analytics by 2023.epub

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Access	Data Mining
Data Modeling & Design	Data Processing
Data Warehousing	MySQL
Oracle	Other Databases
Relational Databases	SQL

Popular ebooks

The Mikado Method by Ola Ellnestam Daniel Brolund(9799)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8318)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(6917)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(6897)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(6778)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(6566)
Driving Data Quality with Data Contracts by Andrew Jones(6527)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6260)
Learning SQL by Alan Beaulieu(6017)
Weapons of Math Destruction by Cathy O'Neil(5806)
Big Data Analysis with Python by Ivan Marin(5458)
Data Engineering with dbt by Roberto Zagni(4458)
Solidity Programming Essentials by Ritesh Modi(4108)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(3970)
Pandas Cookbook by Theodore Petrou(3677)
Blockchain Basics by Daniel Drescher(3316)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2920)
Feature Store for Machine Learning by Jayanth Kumar M J(2826)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2811)
Mastering Python for Finance by Unknown(2754)